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ABSTRACT 

We develop a new method to estimate the redshift of galaxy clusters through 
resolved images of the Sunyaev-Zel'dovich effect (SZE). Our method is based on mor- 
phological observables which can be measured by actual and future SZE experiments. 
We test the method with a set of high resolution hydro dynamical simulations of galaxy 
clusters at different redshifts. Our method combines the observables in a principal com- 
ponent analysis. After calibrating the method with an independent redshift estimation 
for some of the clusters, we show- using a Bayesian approach- how the method can 
give an estimate of the redshift of the galaxy clusters. Although the error bars given 
by the morphological redshift estimation are large, it should be useful for future SZE 
surveys where thousands of clusters are expected to be detected; a first preselection of 
the high redshift candidates could be done using our proposed morphological redshift 
estimator. Although not considered in this work, our method should also be useful to 
give an estimate of the redshift of clusters in X-ray and optical surveys. 
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1 INTRODUCTION 

The advent of new experiments dedicated to the observa- 
tion of the Sunyaev-Zel'dovich effect (Sunyaev & Zel'dovich, 
1972) (SZE hereafter), demands the development of new 
techniques to best analyze these new and exciting data. 
With the SZE it is possible to probe the hot plasma in galaxy 
clusters, which shifts the spectrum of the cosmic background 
radiation. This shift is redshift independent, and it is pro- 
portional to the temperature of the plasma and its electron 
density (n e ). This characteristic (z-independent, and oc n e ) 
makes the SZE an ideal way to explore the high redshift 
population of galaxy clusters. 

However, the fact that the SZE distortion is indepen- 
dent of the redshift of the cluster makes the determina- 
tion of the redshift of the cluster a challenging task. Red- 
shift information is crucial if one attempts to use cluster 
surveys to study the evolution of our universe. The evolu- 
tion of the cluster number counts (dN/dz ) is a very sensi- 
tive indicator of the cosmological mode l feke et al.lll998t 
iMathiesen fc Evrardl Il99i iHenrvl l20od) . The l ocal abun- 
danc e of clusters shows a degeneracy in Q erg feke et alJ 

ll99fJ : iBahcall et al.lfl997f) but this degeneracy can be bro- 
ken with an accurate estimation of dN/dz up to moder- 
ate or high redshifts (z ~ 0.5 — 1) (see e.g IBahcall fc Fan! 
ll998l:lBorgani et al.l200lh . The cluster redshift distributions 
in suitably large SZE cluster surveys can potentially provide 



precise constraints o n the amount and nature of the dark en- 
ergy in the univers e (lHaiman et al.l200ll:lHolder et alJl200ll : 
IWeller et al"1l200ll: iMaiumdar fc Mohrll2002lb Redshifts are 
also necessary to study the evolution of the cluster structure 
and dynamics. 

One normally determines the redshift using photometric 
and spectroscopic observations of the galaxies in the cluster. 
Spectroscopic observations of galaxies in relatively nearby 
clusters are straightforward, but for distant clusters it is 
challenging even with the largest telescopes. For large solid 
angle surveys, photometric redshifts will be of critical im- 
portance, allowing redshift determination for far less time 
invested at the telescope. However, photometric redshifts are 
also time consuming and for clusters above redshift wl, pho- 
tometric redshifts require large telescopes (see for instance 
Diego et al. 2002 where the authors show the selection func- 
tion for a galaxy cluster survey with a 10-m telescope and 
photometric redshift estimations). Future SZE experiments 
will detect hundreds and perhaps thousands of galaxy clus- 
ters. The Planck Surveyor SZE survey is expected to detect 
more than 10 4 clusters with redshifts extending to ~2 (de- 
pending on the cosmological model. bleep et alJl2002T) . A 
planned, arcminute resolution SZE survey from the South 
Pole will detect similar numbers of clusters with a much 
larger fraction at high redshift. 

Measuring redshifts for large solid angle, high redshift 
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cluster surveys is a daunting task. An optimal solution may 
be to combine small and medium-sized telescopes to de- 
termine the redshifts of the low and intermediate z clus- 
ters, reserving the redshift measurements of the most dis- 
tant clusters for the largest available telescopes. Clearly this 
strategy requires crude a priori knowledge of the cluster red- 
shifts. The motivation of the work described here is to ex- 
amine whether it is possible to make this preselection of the 
low, intermediate and high redshift clusters using SZE data 
alone. 

Our method is based only on observed SZE properties of 
galaxy clusters. These include the observed shape and size of 
the cluster, which do have some dependence on the redshift. 
For instance, the apparent size of a particular cluster will 
decrease when increasing its redshift. So, an apparent size 
will, in principle, constrain the cluster redshift. However, the 
apparent size of the cluster also depends on its total mass. 
Two clusters with different redshifts and masses can have 
the same apparent size, provided the more distant cluster 
has a larger mass that exactly compensates for the decrease 
in the apparent size due to the increased redshift. There 
is, therefore, a degeneracy between the cluster redshift and 
mass. 

The question is whether we can break this degeneracy 
by using additional information. A resolved SZE image of 
a cluster provides information not only about the cluster 
size, but also about the shape of the cluster gas distribu- 
tion. The total observed flux of the cluster, for instance, 
depends on the total cluster mass, the redshift and the tem- 
perature. The central SZE decrement depends on the core 
radius and the electron central density, but it is, in principle, 
independent of the redshift. Our method incorporates these 
and other observables to break the mass-redshift degener- 
acy. This method requires resolved SZE images. Therefore, 
it should be useful for arcmin and sub-arcmin resolution ex- 
periments but not for experiments like Planck, where the 
best resolution will be 5 arcmin. 

In this work we will not consider the effects of the rel- 
ativistic corrections and the kinematic effect, because they 
are small compared with the non-relativistic thermal SZE. 
Their effect will be discussed in a later paper. In f|5]we out- 
line the connections between cluster morphology and red- 
shift from a structure formation viewpoint, ^discusses some 
of the weaknesses of this theoretical perspective and then 
provides a detailed description of a method that overcomes 
these weaknesses. A demonstration of the degree to which 
the method works is contained in and a discussion of 
conclusions follows in ^ 



2 CONNECTIONS BETWEEN CLUSTER 
MORPHOLOGIES AND REDSHIFTS 

In this section we will use theoretical arguments to justify 
the use of morphological redshifts. We will develop a sim- 
ple analytic cluster SZE model and then use it to compute 
cluster observables. For the particular case of the isother- 
mal /3-model in a cosmological model with £l m = 1, we will 
demonstrate in subsection 2.2 how combined measurements 
of cluster size and flux lead directly to a redshift estimate. 
Although this cosmological model is not consistent with cur- 



rent data, it's simplicity make it useful for illustrating the 
method, and the results are fully generalizable. 

2.1 A Model for Cluster SZE Signatures 

The distortion in the CMB intensity due to thermal SZE is 

AJ = h * f(x) * y c (1) 

where I rss 2.7 x 10 112 ^, x is the adimensional frequency 
(a; = hv/kbT w v (GHz)/ 56. 8), f(x) is the frequency depen- 
dence of the SZE /(as) = [xcoth(x/2) - 4] x [x 4 e x /(e x - l) 2 ] 
and y c is the cluster Compton parameter: 

Vc = ^ [ Tn(l)dl. (2) 
m e c z J 

This distortion is independent of cluster redshift because 
x does not depend on the redshift (because both v and T 
depend on the redshift in the same way). Therefore, the SZE 
spectral distortion provides no redshift information about 
the cluster. 

However, if we can produce a resolved SZE image of 
a cluster, we gain much more information and can po- 
tentially solve for the redshift. To illustrate this point, 
let us assume the simple case of a cluster at redshift z 
with an electron density profile described by a /3-model 
llCavaliere fc Fusco-Femianall978l) with = 2/3, which is 
the value found to bes t match clusters (|Jones fc Formanl 
Il984t iMohr et alill99^ . In this case the electron density 
profile is just n(r) = n / (l + (r/r c ) 2 ) , where n is the elec- 
tron central density and r c is the core radius, which we take 
to be some constant fraction of the virial radius r c = r v /p. p 
is a parameter with value ranging between 10 and 20. More 
realistic modelings that included, for example, a mass de- 
pendent p, could be considered, but for illustrative purposes 
our simple model will suffice. 

The redshift evolution of these parameter can be mod- 
eled as 

= (3) 

where A c (z) is the critical collapse overdensity with respect 
to the critical density at redshift z (A c (z) = 18-7T 2 + 82a; — 
39x 2 with x = Q(z) - 1 and Sl(z) = h m (l + z) 3 /E(z) 2 ) and 
H(z) = H E(z) (see e. dBrvan fc Norma Jl99gUMohr et alJ 
2000). N is adjusted to fix the total gas mass M gas — 
fi e m p J " n(r)A-Kr 2 dr w /jM where M is the virial mass 
and ft is the baryon fraction. For a fully ionized, purely 
hydrogen gas /j, e = 1. Similarly, for the virial radius we have 

Using the virial theorem (2K + V — 0) and the spherical 
collapse model, we can obtain an expression for the virial 
temperature of the cluster. 

t = t mT (|§y^) 2 ) 1/3 - (8) 

The normalization, T can be obtained from models or from 
a fit of this relation to the data. We wi ll adopt the secon d 
approach and use the values derived in jDieeo et al.ll200ll) . 

Within this model, the Compton parameter in the di- 
rection 6 is 



Morphological redshifts 3 




Isophotal & Total Flux (mJy at 300 GHz) 




Figure 1. Ideal vs real case. The dotted lines represent the 
expected cluster total size-flux relation in the ideal case where the 
cluster is observed with infinite resolution and sensitivity. Each 
line corresponds to a different redshift (listed on the left) for a 
range of masses 3 X 10 13 — 1 X lO 16 ft -1 M0. Top line is for z = 0.05 
while bottom line is for z = 1.5. The symbols show the case in real 
life (simulations) where one observes isophotal quantities and the 
scaling relations differ from the case of the toy model. Note that 
in this case, the characteristic mass as a function of redshift for 
the simulated clusters is decreasing with redshift. This, and the 
fact that we are measuring isophotal sizes instead of virial sizes, 
explains the shift to the left with redshift. 



Figure 2. The ideal case. If the clusters follow exactly the scal- 
ing relations of equations J2J and 1101 . then the recovered redshift 
could be as good as the one shown in this figure. The solid line 
shows the Gaussian pdf of the recovered redshift for an experi- 
ment with 10 arcsec FWHM. In the computation of the pdf we 
have assumed that the flux is measured with no error and the 
diameter has an uncertainty given by the FWHM of the experi- 
ment. The dotted line is the mass as a function of redshift for a 
given flux, and the dashed line is the same for a given diameter. 
This plot is for a cluster with an apparent radius of 4 arcmin and 
a total flux of 100 mJy (at 353 GHz). 



Vc{0) = ^r c n o T$(0) = y o $(0) 



(6) 



where 9 is the angle between the line of sight and the center 
of the cluster. We have assumed here that T is constant, and 
we have ignored any contributions to the y c from outside the 
cluster virial region. The function &(0) is just the integral 
of the density profile along the line of sight. 



V 1 + ( W 2 
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P 2 ~ (g/gc)? 
1 + (0/0c) 2 



8 C is the apparent core radius r c /d,A- The cluster surface 
brightness profile is then 

B(6) = B r c n T${9) (8) 

where (see Eqn B — I a f(x) (kb^T / 'm e c 2 ) . Using this 
model we will now show how cluster morphologies contain 
information about the cluster redshift. 



2.2 Redshifts from Total Flux and Apparent Size 
in the Q. — 1 Case. 

We now apply the model developed above to demonstrate 
that in principle two simple observables provide enough in- 
formation to estimate cluster redshifts. We specifically use 
the evolution model appropriate for Qm — 1 only for sim- 
plicity. In this simple case, the general expressions in the pre- 



vious section reduce their complexity since A c (z) — A c (0) = 
18tt 2 and E(z) 2 = (1 + zf . 

The total cluster flux is just the integral of the SZE 
distortion or surface brightness over the entire solid angle 
of the cluster. S = J B(8)dQ. Taking a step back and not- 
ing that the surface brightness is a line integral, it is clear 
that the total SZE flux is simply an integral over the cluster 
volume. Writing the volume element dV(z) — dQdA(z) 2 dl, 
where dA is the angular diameter distance, we then show 
that 5" = J dQ J dlTn = d~ 2 J dV Tn cx TM gaa d- 2 . The 
total cluster flux is an interesting quantity, depending on 
the density weighted temperature T, the total gas mass 
(but not its detailed distribution) and the cluster distance. 
In the isothermal case the total flux at the frequency x is 
S{x) = So (fbTMis) /d 2 A , where S » 3.781 x f(x) mJy. We 
take the baryon fracti on to be consisten t with SZE obser- 
vations (f b » 0.08ft" 1 iGreeo et ai1l200ll) . The mass M 15 is 
expressed in units of 10 15 h~ 1 M ( r > . In these units, the h~ 2 de- 
pendence of d\ is cancelled with the ^-dependence of fb and 
Mi 5 making the flux /i-independent. If we substitute mass 
for temperature (equation |5J|, we end up with an expression 
which only depends on the mass and the redshift. 



S = 



S T f b Mli\l + z) 
d 2 



(9) 



The total flux of the cluster depends on its redshift through 
the angular diameter distance and inherent evolution of clus- 
ter structure. 
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The apparent size of the cluster 9 c i is another quantity 
which strongly depends on the cluster redshift. If we consider 
that the physical size of the cluster is related to its virial 
radius, then its observed apparent size is (see Eqn. 2J: 



2R M 1/3 
d A (l + z) 



(10) 



where a cluster of virial mass M = 1 X 10 15 Mq has virial 
radius R Mpc at redshift z = 0. If we compare Eqns E| and 
1101 we can see that both depend only on the redshift and 
cluster mass (assuming that the other parameters S ,T ,ft 
and R are fixed by local observation). Thus, in principle, 
measurements of total flux S and apparent size 8 c i provide 
both the cluster mass and redshift. 

Fig. Q contains a plot of the correlation between total 
flux and apparent size at different redshifts using the model 
developed above. Each line in this plot represents a single 
redshift and range of cluster mass between 3 x 10 13 h _1 M© 
(left) and 1.0 x lO 15 ft _1 M (right). The redshifts are, from 
top to bottom, z = 0.05,0.1,0.25,0.5,0.75,1,1.5. This plot 
shows clearly that different redshifts are well separated, al- 
lowing one to solve for redshift and mass with observations 
of the total flux and apparent size. The symbols show the 
case of simulated data. In this case, the clusters do not fol- 
low the scaling relations of the previous model. Note that the 
sizes of the simulated clusters are isophotal sizes rather than 
virial sizes, because there is no clear observational signature 
of the virial region in the SZE properties of a cluster. 

Both Eqn.m&EJcan be solved for mass M. Figure^ 
contains a plot of M versus z for specific values of the total 
flux S (dotted line) and the apparent size 8 c i (dashed line). 
These two functions intersect at the real redshift of the clus- 
ter. This intersection along with the measurement of clus- 
ter size could be used to produce a probabilistic statement 
about the cluster redshift. This is illustrated by the solid 
line in Fig |21 which shows a Gaussian probability distri- 
bution for the cluster redshift that reflects the uncertainties 
in the measured apparent size. These uncertainties are mod- 
elled as a function of the spatial resolution of the experiment 
(FWHM) which introduces an uncertainty in the observed 
apparent size and consequently on the derived redshift. 



3 A METHOD FOR ESTIMATING REDSHIFTS 

The arguments outlined in the previous section are ideal- 
ized. In a real experiment, the situation would depart from 
that described above for several reasons: (i) clusters do not 
follow the scaling relations of Eqns. 151 and 1101 perfectly, be- 
cause of departures from equilibrium and variation in clus- 
ter s tructure due to ongoing merging that is quite common 
fe.g. lMohr et~aflll995ft . (ii) a real experiment is affected by 
instrumenatl noise and limited by sensitivity, which limits 
ones ability to estimate the total flux and apparent size. 
Estimating cluster redshifts from SZE observations is then 
a much more complicated task in practice (see symbols in 
Fig. 0; nevertheless, the underlying scaling outlined in the 
previous section is expected to be a good description of the 
cluster population in a statistical sense. Therefore, here we 
describe an empirical method for estimating redshifts us- 
ing SZE morphology and calibration through direct redshift 
measurement in a subsample of the clusters. 



In developing this method we are guided by several crit- 
ical realizations: 

• In the ideal case we have assumed that all clusters lie 
perfectly on self-similar scaling relations (Eqns 151 and HUP . 
There are observed scaling relations in the galaxy clusters 
which connect, for example, X-ray luminosity to tempera- 
ture and virial mass to temperature. But even in these two 
well known cases, the scaling relations have an intrinsic scat- 
ter, and there is still an ongoing debate about the exact form 
of these relations. Thus, any redshift estimator that employs 
scaling relations must allow for scatter and must not require 
that the exact form of the scaling relation be known. 

• Due to noise sources and the limited instrument sensi- 
tivity, it will not be possible to observe the entire extent of 
a cluster. Therefore, it will be difficult to estimate the total 
flux and size of the cluster from the observed signal. One 
alternative is to work directly with the observed quantities 
like the isophotal flux and isophotal size, where the isophote 
is chosen to lie well above the noise limits of the data. Thus, 
our method must work with readily available observational 
quantities. 

• X-ray observations indicate that cluster gas distribu- 
tions can be reasonably well approxima ted with /3-models , 
but important departures remain (e.g. iMohr et all 11999). 
Observat ions also indicate tha t clusters are not isother- 
mal fe.g. lMarkevitch et aljH998f) . which is no surprise given 
the prevalence of merging (and/or temperature gradients). 
Thus, our method will have to allow for the fact that cluster 
structure varied significantly from system to system. 

• In practice, observations may provide significantly more 
information than is contained in the isophotal flux or size 
alone. Therefore, we need to develop a method that can 
handle multiple observables- even redundant observables- 
in an optimal and graceful manner. 

3.1 SZE Observables 

The idea behind morphological redshift estimation is that by 
combining many observables taken from the 2D SZE cluster 
profile it is possible to divide the clusters in different groups, 
each one for a different redshift interval. The observables 
must be such that they take into account the last points of 
the previous section. The list below is not exhaustive, but it 
includes all the observables used in the following section in 
our attempt to estimate cluster redshifts. 

• Isophotal size. The apparent isophotal size (mean di- 
ameter) 9i is given by the following expression 



0! = 2y/A/^ (11) 

where A is the total (ap parent) area enclosed by the isophote 
l|Mohr fc Evrardll997h . In this work, we will use an isophote 
defined to be well above the instrument sensitivity. This sen- 
sitivity defines a threshold in the 2D images. The X-ray 
isophotal size exhibits a tight correlation with the emis- 
sion weighted X-ray temperature both in local samples 
Mohr^^Evrard| l997h and in intermediate redshift samples 
Moh r et al.ll2000r) . Current SZE observations do not have 
the required sensitivity to examine this property, but many 
future experiments will have sufficient sensitivity. 

• Isophotal flux. The isophotal flux is just the total flux 
within the isophote. 
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.57 



S{6)dQ 



(12) 



This quantity has never been examined in X-ray observa- 
tions of ensembles of clusters, but in hydrodynamical simu- 
lations this quantity appears to be strongly correlated with 
the isophotal size. Nevertheless, we include this quantity, 
because it may provide additional information, and our 
method handles redundant observables gracefully. 

• Central amplitude. For the case of the /3-model de- 
scribed above the central amplitude is: 



A = B o r c n o T$(0) 



(13) 



The central amplitude only depends on the redshift through 
the intrinsic cluster evolution (z— dependence of r c , no and 
T). The central amplitude is the integrated effect of the pro- 
jected electron population through the cluster center. If one 
assumes the scaling relations given in Eqns[^]and[I] the cen- 
tral decrement is directly proportional to the total cluster 
mass. The situation will be more complicated in real clus- 
ters, but the example of the /3-model is useful to illustrate 
the utility of the central decrement. 

• First and second derivatives of the SZE profile. 
The first and second derivatives of the observed brightness 
profile for the /3-model (see Eqns [7] and |HJ are shown in 
Figure |21 where we also show the projected density profile 
(proportional to $(#)) for comparison. The curves have been 
renormalized by their respective maximums. The core radius 
in this case was 6 C = 0.5 arcmin, and p = 10. 

The second derivative evaluated at the cluster center (0 — 
0) is 



d 2 B(6) 
d6 2 



(0) = -B 



r c n T 



$(0) + - 
V 



(14) 



which depends on the redshift through the evolution of 6 C . 
In the regime of interest (p >> 2), the second derivative in 
the center is a tracer of the core radius. 

The first derivative in the center is null for all the clusters, 
but it is clear in Fig |3 that the first derivative reaches a 
maximum at 6 ~ 9 C . The position of the maximum coincides 
with the region where the second derivative vanishes. The 
value of the first derivative at the core radius is: 

dB(0) , B r c n T 



d6 



= B n TdA 



(15) 



That is, the first derivative in the region where the second 
derivative vanishes is independent of the core radius and 
is proportional to the angular diameter distance. In cases 
where the core radius is proportional to the mass, the sec- 
ond derivative in the center and the first derivative in the 
region where the second derivative vanishes should be useful 
in breaking the degeneracies between the cluster mass and 
redshift. 

• Minkowski functionals. Several recent works have 
suggested that Minkowski functionals applied to galaxy 
clusters should be g ood tracers of the cluster evolution 
(Bcis bart et aljfeoOlft . If clusters form by merging events, 
their internal structure should evolve with redshift (although 
local clusters are known to exhibit lots of evidence for merg- 
ing). Morphological evolution can in principle be traced with 
Minkowski functionals. In this paper we will consider three 
of them: (i) total perimeter of the isophote, (ii) ellipticity 
of the isophote and (iii) number of subgroups above the 
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Figure 3. Projected density profile (solid line) as a function of 
radius. Also shown are the first derivative (dotted line) and the 
second derivative (dashed line). All curves have been renormal- 
ized. The core radius is 8 C = 0.5 arcmin and the ratio between 
the virial and core radius is p = 10. 



isophote. In a recent paper employing Minkowski function- 
als, the authors claim that some evolution with redshift in 
the ellipticity of galaxy cl usters can be observed in large 
optical and X-ray samples <Pliondl2003l . 

• Wavelet coefficients. The mexican-hat wavelet 
(MHW) is the second derivative of a Gaussian, and it has 
been prop osed as an ideal filter for compact sou rce subtrac- 
tion (e.g. ICavon et alJl200ot IVielva et al.ll200lf) . Although 
the use of the MHW and the number of coefficients (scales) 
is somewhat arbitrary, we will include them just to show 
that the inclusion of more observables does not pose prob- 
lems for this method. We take the 3 MHW coefficients at 
the center of the cluster, which produces coefficients that 
are highly correlated with the second derivative. By chang- 
ing the scale of the MHW we are sampling the cluster at 
different radii. The three scales considered in this work are 
s — 0.25, 0.75, 1.66 arcmin. 



3.2 Principal Component Analysis 

Principal component analysis (PCA hereafter) has been 
widely used in the last years as a powerful class i ficator of 
data s ets (e.g. lDeemind(l964t iTeuber et al"lll979l : IWhitnevI 
1983; Ro nen et alJ 119991) . For our particular case, PCA 
has several desirable advantages which can be briefly 
summarized as follows: (i) PCA produces an optimal 
linear combination of the observables that maximizes the 
variance of the linear combination (or projection), (ii) 
there is no limit in the number of observables, (iii) it is a 
non-parametric method, which means no assumptions about 
the cluster scaling relations are required, (iv) the principal 
components returned by PCA are by definition indepen- 
dent, which simplifies their use in the final computation of 
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the redshift of the cluster. 

In this paper we will give only a brief description of 
PCA. The reader is referred to the abundant literature about 
PCA for a more detailed description of the method. 
Let us consider a multivariate data set formed by N obser- 
vations, each observation producing m observables. In our 
particular case the observations will be the resolved SZE 
images and the observables will be the morphological quan- 
tities derived from each one of the images. This data set 
can be considered as an array of N elements in an space of 
m dimensions. That is, each observation has m coordinates. 
Hereafter, we will refer to our data set as the matrix Xn™- 
The idea of PCA is that, in many cases, it is possible to re- 
duce the dimensionality of the problem without losing any 
significant amount of information. PCA is specially power- 
ful in those cases where there are correlations between some 
of the observables. In this case, the dimensionality of the 
problem is reduced by projecting the entire data set over 
a new orthogonal coordinate system which is aligned with 
the direction of the main correlations in the W 1 space. The 
direction of the correlations in 5R m can be found by minimiz- 
ing the sum of distances between the data points and the 
direction of the correlation. However, this is equivalent to 
maximizing the variance of the data points when projected 
onto the direction of the correlation. This is what PCA does. 

Finding the principle components reduces to an eigen- 
value problem (eigenvectors and eigenvalues of the covari- 
ance matrix S = Ajv m x X^ Nm ). The information carried 
out by each one of the principal components (eigenvectors) is 
proportional to the value of its associated eigenvalue. So, the 
eigenvector with the highest eigenvalue contains the highest 
amount of information. On the contrary, the eigenvectors 
associated with the lowest eigenvalues does not retain much 
useful information and they can be dominated by the noise 
in the data. One can, therefore, consider only the eigenvec- 
tors associated with the highest eigenvalues to compress the 
data set. In our case, the m observables for each cluster will 
be compressed into p principal components. The criterion 
to chose the value of p is given by the percentage of the to- 
tal variance retained by the p highest eigenvalues. Usually a 
good criterion is to retain only those eigenvalues for which 
the previous percentage is about 90 — 95%. For this partic- 
ular application, we will see in the next section how we can 
retain approximately ~ 90% of the variance with only the 
first three principle components. As we will see, our list of 
observables is highly redundant! 



P(z) oc dN/dz), this information should also be included 
when making cluster redshift estimates. The second term, 
P(d/z), is known as the likelihood of the data. In our case, 
the data are the three principal components. Because these 
components are orthogonal by construction, we model the 
likelihood as; 

P(d/z) = P{pc 1 /z)P{pc 2 /z)P(pc a /z) (17) 

where pet is the i th principal component. Each one of this 
individual probabilities, P(pCi/z), gives the probability of 
the observed principal component, pd, to be associated with 
the redshift z. 

Computing accurate P(pd/z) is absolutely critical, be- 
cause errors would likely lead to biases in the redshift es- 
timates. The safest approach is a process that we call self- 
calibration, which requires an observed training set of clus- 
ters with independently known redshifts. Such a training set 
could be arranged by simply carrying out a portion of the 
SZE survey in regions of the sky that have been spectro- 
scopically observed as part of the SDSS or 2dF surveys. At 
high z, our calibration method will be limited by the avail- 
ability of identified clusters at those redshifts and a follow 
up of these clusters will be needed. With measured redshifts 
for some of the clusters, we can compute P(pd/z) for each 
principal components over a range of redshifts. Here we will 
model the pdf as a Gaussian with two free parameters, the 
mean value of the principal component at redshift z, pd(z), 
and its dispersion at the same redshift, a pCi (z). With this 
form the likelihood for pd is 

_ (? c j-p c ti s )) 2 

P( P d/z) = e ^ (18) 

Other probability distributions (i.e. Poisson, % 2 ) could be 
used, and if the training set were large enough, one could 
use the histogram (pdf) of the principal components directly. 

As a first application of our method, we have applied 
PCA to the toy model of subsection 13.11 with 5 observables; 
total flux, total size, central amplitude, first derivative and 
second derivative. The result is that only the first two result- 
ing principal components are relevant. The first one retains 
72.46 % of the total variance and the second one retains 
27.53 %. This is not surprising since there are only two in- 
dependent variables in the toy model (redshift and mass). 
The first PC is dominated by the two derivatives and the 
total size while the second PC is dominated by the central 
amplitude and the total flux. The recovered redshift is un- 
biased and the errors are small (la error less than 10 %). 



3.3 Redshift estimation 

The final component is the probability distribution for the 
cluster redshift z given the data d. We use Bayes theorem 
for this purpose; 

P{z/d) oc P(z)P(d/z) (16) 

where P(z) is known as the prior, it provides the probabil- 
ity of any cluster to be at redshift z. This prior is cosmo- 
logically dependent, and it is in principle well defined if one 
knows the selection function of the survey. For simplicity, we 
will consider a constant prior in this work; however, when 
estimating cluster redshifts in a real survey where a good 
estimate of the cluster redshift distribution is known (i.e. 



4 APPLICATION TO SZE IMAGES FROM 
HYDRO DYNAMICAL SIMULATIONS 

In this section we will apply our method to SZE images 
of simulated galaxy clusters. But first we provide a brief 
description of the simulations. 

4.1 Simulations 

The clusters were simulated with a combined dark matter 
and hydrodynamics code in a cosmological-constant domi- 
nated universe (f2 ro = 0.3, A = 0.7, h = 0.7, fl b = 0.026 
and as = 0.928). The dark matter was modeled with an 
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Figure 4. A sample of simulated clusters with different masses 
and redshifts. From top to bottom. Upper row, two clusters with 
the same redshift (z = 0.25) but different masses. Next row, two 
clusters with redshifts 0.75 (left) and 0.5 (right) and different 
masses. Next row, two clusters with the same redshift (z = 0.1) 
but different mass. Bottom row, two clusters with redshifts 1.0 
(left) and 1.5 (right) and different masses. The dimension of this 
image is 80 arcmin in the horizontal direction and 160 arcmin in 
the vertical direction. 

adaptive particle-mesh method, while the gas was followed 
with an adaptive mesh refinement technique (Bryan 1999; 
iNorman fc Brvanlll999|) . This grid-based method adds ad- 
ditional meshes in high-density regions to obtain high res- 
olution in the central regions of clusters, while low-density 
regions are simulated at low resolution in order to keep the 
CPU requirements modest. The best resolution so obtained 
is 16 kpc, with generally about 100,000 particles within the 
virial radius of each cluster. The simulations do not include 
cooling and star formation due to the uncertainty in mod- 
elling these processes. This somewhat reduces the realism of 
the simulations, but because we are only testing the method 
with the simulated clusters, not calibrating it, this should 
not affect our conclusions substantially. 

Clusters used in this study are taken from a volume- 
limited, simulated sample and range in mass from 0.7 to 



Figure 5. The data set Xjvm projected onto the first three 
principal components. Bluer clusters (larger dots) are low redshift 
ones, red points (smallest size) are highest redshift, green/yellow 
clusters (intermediate sizes) lie in between (intermediate z). 

2.0 x 10 15 M Q at z = 0. Maps of the Compton parame- 
ter y c are generated, some of which are shown in Figure 
2] The same clusters are imaged at a variety of redshifts, 
so the various redshift samples are not fully independent 
(this makes our conclusions conservative in the sense that it 
should decrease the observed differences and hence make it 
more difficult to separate the various redshifts than with a 
real observational sample). The clusters are imaged at the 
following discrete redshifts, z = 0.05,0.1,0.25,0.5,0.75,1.0, 
and 1.5. 

4.2 Observations 

We filter the images with a Gaussian filter (FWHM 25 arc- 
sec) to simulate the effect of the finite instrument resolution. 
This resolution is achievab le with some current experiments 
iPointecouteau et alJl200"lh . but achieving this resolution is 
not straightforward. For instance, it requires a single dish 
antenna of 30 m diameter like IRAM working at ~3 mm 
to achieve this resolution. Interferometers can also produce 
images with these resolutions. The instrument sensitivity is 
included by setting a threshold on the filtered Compton pa- 
rameter images. The sensitivity of the experiment to the 
SZE signal will depend basically on the instrumental noise 
and the confusion noise (mainly due to primordial CMB 
and point sources). The confusion noise can be reduced with 
multifrequency experiments, which allow partial subtraction 
of the CMB component. Higher angular resolution obser- 
vations dramaticaly reduce the point source noise contribu- 
tion (easily carried out by existing interferometers with long 
baselines). 

Here we assume that we can see only the cluster emis- 
sion above a specific threshold that corresponds to yl h — 
8.0 x 10~ 6 . This threshold corresponds to our isophote in 
the isophotal size and flux. With this threshold, we lose some 
of the high redshift (z = 1.5) clusters, which after filtering 
have a surface brightness below the threshold, but we can 
still observe most of our simulated high— z clusters. 

Our data set Xjv m is then a matrix with a number of 
rows, N, equal to the number of observed clusters in the 
survey, and a number of columns, m = 11, equal to the 
number of observables for each cluster. When using PCA, 
it is convenient to re-scale the observables in order to make 
them of the same order of magnitude. Here we use the log of 
the observables and then solve for the principle components 
of the covariance matrix, S = Ajv m x X^ Nm . We find that the 
first three principle components are responsible for ~90% of 
the dispersion in our data. That is, 3 principle components 
contain almost all the information within our 11 original 
observables. 

TablelOcont ains the first three eigenvectors with their 
associated eigenvalues (A) and percentages. The form of the 
eigenvectors clearly shows which of the 11 observables are 
the most relevant. The first principal component (with the 
highest eigenvalue), PCI, is dominated by the isophotal flux, 
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PCI 


PC2 


PC3 


X 


5.15 


3.56 


1.03 


Percentage 


46.8 


32.4 


9.4 


Observables 




Eigenvectors 




Isoph. Flux 


-0.97 


-0.21 


-0.04 


Isoph. Size 


-0.98 


-0.11 


-0.05 


Central Amp. 


-0.38 


-0.88 


0.01 


d 2 


0.75 


-0.62 


-0.03 


d 


0.81 


-0.47 


0.07 


perimeter 


-0.97 


-0.11 


-0.16 


ellipticity 


0.18 


-0.2 


-0.66 


groups 


0.21 


-0.1 


-0.74 


MHW1 (0.25) 


0.62 


-0.76 


-0.02 


MHW2 (0.75) 


0.07 


-0.96 


-0.02 


MHW3 (1.66) 


-0.65 


-0.71 


-0.01 



Table 1. First 3 eigenvectors of the principle component analysis 
(columns) and associated eigenvalues (first row) of the 11 observ- 
ables outlined in ■14.21 The numbers in parenthesis are the typical 
scales of the MHW's in arcmin. The second row gives the asso- 
ciated percentage for each one of the eigenvectors. The principal 
components are each a linear combination of the observables; the 
coefficients of the combination are listed. 



the isophotal angular size, the perimeter and the first deriva- 
tive. The second principal component is dominated by the 
central amplitude, the MHW coefficients and the second 
derivative and the third principal component is dominated 
by the ellipticity and the number of subgroups. It is not sur- 
prising to see that the flux and size are contributing signifi- 
cantly to the most relevant principal component (PCI). On 
the contrary, the ellipticity and number of subgroups only 
contribute significantly to the third principal component. 

In Figure|5]the data set is projected in the space of the 
three principal components. We have used a spectral scale 
of colors. That is, blue points are clusters at low redshift, 
green and yellow points are intermediate redshift and red 
points are high redshift. As can be seen, different redshifts 
are grouped in different regions in this 3D space. This will 
allow us to discriminate between low, intermediate and high 
redshift clusters. Figure[S]contains the projection of the orig- 
inal data set in the space defined by the first and second 
principal components only. The different grouping of clus- 
ters as a function of their redshift can be also appreciated 
in this space. 

To estimate the redshift we use the expression given in 
Eqn. 1171 which requires the quantities pc7(z) and a pCi (z) 
which must be estimated from the training set. In our 
case the simulated clusters lie at discrete redshifts [z = 
0.05,0.1,0.25,0.5,0.75,1.0, and 1.5), and so we compute 
pci(z), and a pCi (z) only at those redshifts. A number of clus- 
ters > 10 per redshift interval is needed in order to properlly 
estimate pcl(z), and <j pCi (z). The total number of clusters in 
our simulations is ~ 100 and there are about 10 — 15 clusters 



Figure 6. Projection in the first and second principal compo- 
nents space. (+)'s are for z = 0.05, (*)'s for z = 0.1, (x)'s for 
z = 0.25 diamonds for z = 0.5, triangles for z = 0.75, squares for 
z = 1.0 and (+)'s again for z = 1.5. 



at each one of the discrete redshifts. Due to this low number 
of clusters, we have taken the training set to be coincident 
with the total sample of clusters. Since we have discrete red- 
shifts in our simulation, we have to interpolate pci(z) and 
<jp Ci (z) for arbitrary redshift. In a real survey the situation 
could be much better if redshifts were available for a larger 
number of clusters with a more continuous distribution in z. 
Then, the training set could be larger and without any need 
of interpolation. 

Once we have pci(z), and a pCi (z) for different redshifts 
we can apply the Bayesian estimator (see Eqn. 1171 for each 
one of the remaining clusters. Figure shows the final 
result of our method. The mean of the recovered redshifts 
follows very well the true redshift of the clusters. The error 
bars are small at low redshifts but they grow larger at 
higher redshift. We also show the result obtained when only 
the first principal component is considered in the analysis 
(point at z = 0.05 is not shown). In this case, the error bars 
and bias are smaller at the high redshift interval but they 
are larger at the smaller redshifts. If we look at Figure |S| 
and we project the points into PCI and PC2 we see how 
the projection into PC2 is more noisy in the sense that the 
overlap of the different redshift intervals is stronger than in 
the projection into PCI. This overlap affects the individual 
likelihoods (Eg. 1181 . which can show a bimodal or trimodal 
behaviour specially at high redshift (i.e. the individual 
likelihood for PC2 (and PCS) has local maxima at different 
redshifts). The addition of the second and third principal 
components in the analysis adds noise to the z-estimation 
in particular in the high redshift interval. On the contrary, 
at low redshifts, the second and specially the third principal 
component (see Figure |KJ show a clear dependence with the 
redshift which helps to better estimate z. Consequently the 
redshift estimation becomes more noisy when we include 
the second and third principal components at high redshift 
but, in the low-z interval, the second and third principal 
components helps to reduce the error bars. 
To understand this behaviour, it is helpful to study how dif- 
ferent observables contribute to the redshift estimation. We 
have split the list of observables into two groups and applied 
PCA to each group. In the first group we include three of 
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the most relevant observables, central amplitude, isophotal 
flux, and isophotal size while in the second group we include 
the remaining 8 observables, first and second derivatives, 
3 MHW coefficients, ellipticity, number of subgroups and 
perimeter. We compare the results in Figure |H| The first 
group renders a z-estimation similar to the case where only 
PCI is used in the analysis (see Figure |7J. This is not quite 
surprising since PCI was dominated by the isophotal flux 
and isophotal size. This result shows that even with a small 
number (3) of observables it is possible to get an estimate 
of the redshift. However, the other observables also contain 
information about the redshift. This is also illustrated in 
Figure |H| (dotted line). In this case it is important to note 
how the 8 additional observables help to reduce the scatter 
in the low redshift interval. Thus it can be useful to include 
more observables in the analysis to reduce the uncertainty. 
However, the additional 8 observables increase the scatter 
at higher redshifts. 

Our results show that morphological redshifts are not 
precise estimators of the cluster redshift, but they are use- 
ful providing a first guess that could be critical in planning 
the cluster followup observations to determine photometric 
or spectroscopic redshifts. Also, we note that the redshift 
distribution expected for cluster surveys does not contain 
any sharp features in redshift, suggesting that even moder- 
ately accurate redshifts like those possible with morpholog- 
ical estimators may be sufficient for deriving cosmological 
constraints. This clearly deserves further attention. 



5 CONCLUSIONS 

We have developed a means of estimating galaxy cluster 
redshifts using only observed SZE properties of the clusters. 
Using a toy model we show how morphological quantities 
associated to clusters may contain redshift information. We 
also show how modelling of the morphological quantities can 
lead to systematic errors in the redshift estimation. We then 
propose an alternative method which is model independent. 
Specifically, we have combined several redshift sensitive SZE 
observables using a standard principal component analysis 
(PC A). The PCA led to significant compression, showing 
that most of the redshift information contained in the 11 
SZE observables can be expressed in three orthogonal linear 
combinations. The use of the PCA has several advantages. 
These include (i) no required assumptions about cluster scal- 
ing relations, (ii) straightforward to use of direct observables 
(like the isophotal quantities) , and (iii) orthogonality of the 
principal components. The method must be calibrated, and 
we suggest using a cluster training set that has redshift esti- 
mates from photometric or spectroscopic means. This train- 
ing set is required to build the likelihoods of the principal 
components as a function of redshift. 

In our analysis we include 11 different observables: 
isophotal flux, isophotal size, central amplitude, second 
derivative at the center, the mean of the first derivative in 
the region where the second derivative vanishes, the elliptic- 
ity and perimeter of the isophote, the number of subgroups 
above the isophote, and three Mexican-hat wavelet coeffi- 
cients evaluated at the cluster center. Principle components 
were determined, and the first three components had ~ 90 
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Figure 7. Mean recovered redshift and error bar (dispersion) as 
a function of redshift. The solid line represents the ideal situation 
where the recovered redshift equals the true one. For comparisson 
we also show the correponding recovered redshifts when only the 
first principal component is used in the Bayesian approach (dotted 
error bars). The error bars for this case have been displacced 0.05 
units in redshift to the right. 

% of the variance of the data. Application of our redshift 
estimator using these three components indicates that the 
method can distinguish between clusters at low, intermedi- 
ate and high redshift. 

Although the error bar for a specific cluster redshift is 
fractionally large, our method should be useful for future 
SZE surveys, providing a preselection of low, intermediate 
and high redshift clusters. This preselection can be used to 
optimize the optical followup. Because of the smoothly vary- 
ing nature of the cluster redshift distribution expected in 
future surveys, it may also be possible to obtain cosmolog- 
ical constrai nts directly with thes e morphological redshifts. 
As shown in lFan fe Chiuehl feOOlT) . the ratio of the number 
of clusters above and below a given redshift can be a useful 
cosmological discriminator. This kind of analysis could be 
well suited to our morphological redshift estimates. 

A requirement for morphological redshifts is resolved, 
SZE cluster images. Our estimates were carried out assum- 
ing an instrument resolution of 25 arcsec. This resolution 
requirement makes our method inappropriate for applica- 
tion to clusters detected in the Planck Surveyor mission, 
but there are several planned interferometric and single dish 
SZE surveys which could take advantage of our method. 

Although in this work we have only considered the case 
of the SZE, our method can be extended to X-ray and optical 
cluster surveys. The main difference would be that the flux 
in the X-ray and optical bands are inversely proportional to 
the luminosity distance squared and the region of the spec- 
trum observed by a particular instrument also varies with 
redshift. The difference between the luminosity distance and 
the angular diameter distance is a factor (1 + z) 2 . In gen- 
eral, the X-ray and optical flux is much more sensitive to the 
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Figure 8. Recovered vs true redshift in the case where only three 
obscrvables: central amplitude, isophotal flux, and isophotal size 
are considered in the PC A analysis (solid line). This result is com- 
parable with what the one obtained when the 11 observables are 
considcrd but only the first PC was used in the redshift estima- 
tion. The dashed lines show the corresponding redshift estimation 
when the remaining 8 observables are considered in the analy- 
sis (first and second derivatives, 3 MHW coefficients, ellipticity, 
number of subgroups and perimeter) and the central amplitude, 
isophotal flux, and isophotal size are excluded. The true redshift 
has been displaced 0.05 to the right to avoid overlapping. Note 
the good constraints on z obtained by these eight observables at 
low 2. 

cluster redshift than is the SZE flux. Although the redshift 
of galaxy clusters in X-rays can be obtained, for some clus- 
ters, directly from their X-ray spectrum (with typical errors 
of Az w 0.2), for many clusters with a low SNR the redshift 
can not be obtained from this method. Large, planned X- 
ray surveys will have a preponderance of low signal to noise 
detections, making the use of morphological redshifts (alone 
or combined with photometric redshifts) very promising. 
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